174 research outputs found
Automatic database acquisition software for ISDN PC cards and analogue boards
This paper describes an application for automatic speechdatabases
acquisition (ADA) developed by the authors in the
framework of the EC Telematics Project SpeechDat II. The
software is able to work with standard inexpensive PC cards for
ISDN lines, as well as Dialogic Boards for analogue telephone
lines. Both program versions share a common file format and
configuration. Other important characteristics of the recording
software are its simple set-up, a fast and flexible configuration of
the recording session, the real-time monitoring of calls and disk
space, and its proven robustness.Peer ReviewedPostprint (published version
Synthesis using speaker adaptation from speech recognition DB
This paper deals with the creation of multiple voices from a Hidden Markov Model based speech synthesis system (HTS). More than 150 Catalan synthetic voices were built using Hidden Markov Models (HMM) and speaker adaptation techniques. Training data for building a Speaker-Independent (SI) model were selected from both a general purpose speech synthesis database (FestCat;) and a database design
ed for training Automatic Speech Recognition (ASR) systems
(Catalan SpeeCon database). The SpeeCon database was also
used to adapt the SI model to different speakers. Using an ASR designed database for TTS purposes provided many different amateur voices, with few minutes of recordings not performed in studio conditions. This paper shows how speaker adaptation techniques provide the right tools to generate multiple voices with very few adaptation data. A subjective evaluation was carried out to assess the intelligibility and naturalness of the generated voices as well as the similarity of the adapted voices to both the original speaker and the
average voice from the SI model.Peer ReviewedPostprint (published version
Fir system identification using a linear combination of cumulants
A general linear approach to identifying the parameters of a moving average (MA) model from the statistics of the output is developed. It is shown that, under some constraints, the impulse response of the system can be expressed as a linear combination of cumulant slices. This result is then used to obtain a new well-conditioned linear method to estimate the MA parameters of a nonGaussian process. The proposed approach does not require a previous estimation of the filter order. Simulation results show improvement in performance with respect to existing methods.Peer ReviewedPostprint (published version
The strategic impact of META-NET on the regional, national and international level
This article provides an overview of the dissemination work carried out in META-NET from 2010 until early 2014; we describe its impact on the regional, national and international level, mainly with regard to politics and the situation of funding for LT topics. This paper documents the initiative’s work throughout Europe in order to boost progress and innovation in our field.Postprint (published version
Monolingual and bilingual spanish-catalan speech recognizers developed from SpeechDat databases
Under the SpeechDat specifications, the Spanish member of SpeechDat consortium has recorded a Catalan database that includes one
thousand speakers. This communication describes some experimental work that has been carried out using both the Spanish and the
Catalan speech material.
A speech recognition system has been trained for the Spanish language using a selection of the phonetically balanced utterances from
the 4500 SpeechDat training sessions. Utterances with mispronounced or incomplete words and with intermittent noise were discarded.
A set of 26 allophones was selected to account for the Spanish sounds and clustered demiphones have been used as context dependent
sub-lexical units. Following the same methodology, a recognition system was trained from the Catalan SpeechDat database. Catalan
sounds were described with 32 allophones. Additionally, a bilingual recognition system was built for both the Spanish and Catalan
languages. By means of clustering techniques, the suitable set of allophones to cover simultaneously both languages was determined.
Thus, 33 allophones were selected. The training material was built by the whole Catalan training material and the Spanish material
coming from the Eastern region of Spain (the region where Catalan is spoken).
The performance of the Spanish, Catalan and bilingual systems were assessed under the same framework. The Spanish system exhibits
a significantly better performance than the rest of systems due to its better training. The bilingual system provides an equivalent
performance to that afforded by both language specific systems trained with the Eastern Spanish material or the Catalan SpeechDat
corpus.Peer ReviewedPostprint (published version
New hos-based parameter estimation methods for speech recognition in noisy environments
The problem of recognition in noisy environments is addressed. Often, a recognition system is used in a noisy environment and there is no possibility of training it with noisy samples. Classical speech analysis techniques are based on second-order statistics and their performance dramatically decreases when noise is present in the signal under analysis. New methods based on higher order statistics (HOS) are applied in a recognition system and compared against the autocorrelation method. Cumulant-based methods show better performance than autocorrelation-based methods for low SNRPeer ReviewedPostprint (published version
Comparison of different order cumulants in a speech enhancement system by adaptive Wiener filtering
The authors study some speech enhancement algorithms based on the iterative Wiener filtering method due to Lim and Oppenheim (1978), where the AR spectral estimation of the speech is carried out using a second-order analysis. But in their algorithms the authors consider an AR estimation by means of a cumulant (third- and fourth-order) analysis. The authors provide a behavior comparison between the cumulant algorithms and the classical autocorrelation one. Some results are presented considering the noise (additive white Gaussian noises) that allows the best improvement and those noises (diesel engine and reactor noise) that leads to the worst one. And exhaustive empirical test shows that cumulant algorithms outperform the original autocorrelation algorithm, specially at low SNR.Peer ReviewedPostprint (published version
Some robust speech enhancement techniques using higher order AR estimation
Peer ReviewedPostprint (published version
- …